Introduction

Pseudorandom generators are used ubiquitously in cryptography in order to overcome the deterministic limitations of computers and generate good enough pseudorandomness from true randomness.

An algorithm which fulfils the task of generating more bits from a smaller number of bits is called a generator.

Definition: Generator

A generator is an efficient algorithm where which takes a binary string as an input and produces a longer binary string as an output.

A generator which takes a short string of random bits, called a seed, and expands them into a larger string of pseudorandom bits is called a pseudorandom generator (PRG).

Definition: (Secure) Pseudorandom Generator (PRG)

A (secure) pseudorandom generator is a generator such that for every input, called a seed, and every efficient statistical test , the output is pseudorandom, i.e. it holds that

for some negligible .

The set is called the seed space and the set is called the output space.

Definition Breakdown

This definition tells us that an algorithm which takes a uniformly chosen binary string of length (i.e. "truly random" string), called a seed, and outputs a longer binary string of length , is a pseudorandom generator if there is no efficient statistical test which can distinguish between 's output and a string chosen uniformly at random from the output space with non-negligible probability.

Essentially, the definition says that the probability that any statistical test thinks a string generated by is random is approximately equal to the probability that the same statistical test thinks a string uniformly chosen from is random, i.e.

It does not matter if you understand the nitty-gritty details of this definition for the security of a pseudorandom generator because it is one of the most useless pieces of information you will encounter in your lifetime. The reason for this is that there is no known PRG which has been proven to satisfy this definition because being able to prove it means that one is able to prove that .

Nevertheless, it gives us an idealised model for what a secure PRG should be.

Determining the Security of a PRG

We can derive some properties from the definition of a PRG which can hint that a candidate PRG is secure and can be trusted.

PRG Properties: Unpredictability

Security

A secure PRG is unpredictable in the sense that there is no algorithm which given the first bits of the output of can guess what the bit would be with probability that is non-negligibly greater than . Similarly, an unpredictable PRG is secure.

Proof: Unpredictability

Security

We are given a secure PRG and need to prove that it is unpredictable. Suppose, towards contradiction, that is predictable, i.e. there exists is an index and an efficient algorithm which when given the bits of the output of can guess the bit with probability for some non-negligible (yes, even if the algorithm works for a single position in the output, then the PRG is predictable). We define the following statistical test (or distinguisher)

Essentially, outputs 1 if the algorithm guesses correctly. If the string is chosen from a uniform distribution, i.e. , then the algorithm has no information and cannot guess the bit with any probability better than . On the other hand, if is generated by , then the algorithm can guess with probability which means that there is a statistical test which can differentiate between a string generated by and a truly uniformly chosen string which contradicts the original assumption that is secure.

For the other direction, we are given a generator that is unpredictable for all positions . We want to prove that is secure. We will denote by a string of which the first bits were generated using and the rest bits were chosen according to a uniform distribution. We denote the distribution of such strings as . It is clear that is the uniform distribution and is . We need to show that for all .

Suppose, towards contradiction, that for some ,, i.e. there is a distinguisher such that

for some non-negligible .

TODO

Unfortunately, these two properties only provide a potential way to rule out an PRG as insecure. Proving that a PRG is unpredictable equally as difficult as proving that a PRG is secure, since it is essentially an equivalent definition for the security of a PRG.

Leap of Faith

At the end of the day we just assume that secure generators exists. In fact, we have many PRGs that we believe to be secure but are just unable to prove it. Similarly, we have many PRGs that have been shown to be insecure and should not be used. So really, we consider a PRG to be secure until someone comes along and shows a way to break it. Since we have no better alternative, i.e. we do not know how to prove that a PRG is secure, we are forced to take the leap of faith and make-do with what we have.

Nevertheless, in order to be as safe as possible, one needs to make as few assumptions as possible and indeed that is what cryptography does. The only assumption regarding the existence of secure PRG which cryptography makes is the following.

Assumption: Existence of a Secure PRG

There exists a secure which takes a seed of length and produces a pseudorandom string with length .

This assumption has neither been proven nor refuted, however there is a lot of evidence supporting it (and it better be true because cryptography falls apart otherwise). Okay, but this assumption in itself does not seem particularly helpful, for it only allows us to produce a pseudorandom string which is one bit longer than its random seed - we have really only gained 1 bit of randomness. Fortunately, it turns out that if we assume this to be true, this PRG can actually be used to construct a new PRG which takes a seed of length and produces an output of any length we might want.

Let's see how we can do this. We are given a pseudorandom generator and want to use it to create a new generator ) which can use the same seed to produce a pseudorandom string whose length is arbitrary. This is actually pretty simple. First, one feeds the seed to the generator which will output a string of length . We can take the last bit of and use it as the first bit of the output of . Taking 1 bit from reduced its length to , so we can use it as input to once again. We repeat the process times until the bits output by at each step form a string of length .

And here is a implementation in pseudo-code:

#![allow(unused)]
fn main() {
fn GenT(seed: str[S]) -> str[T] {
	let y: str[T]; // Initialise the output y
	let current_seed = seed;
	
	let i = 0;
	while i < T {
		let pseudorandom_str = Gen(current_seed); // Get the output of Gen from the current seed
		y[i] = pseudorandom_str[S]; // Use the last bit of Gen's output for the current bit of the output y; the last bit is at index (S + 1) - 1 = S
		current_seed = pseudorandom_str[0..S] // The new seed is the output of Gen without the last bit
	}
	
	return y;
}
}

This algorithm provides us with a generator that can produce a string of any length given a seed of length . Well, there is actually one restriction - must be equal to for some polynomial . Otherwise the above algorithm would take non-polynomial time to execute - it would not be efficient.

Proof: Security of GenT

We are given the algorithm with seed space and we need to prove that it is secure.

Let's introduce some notation. We denote by a string whose first bits were chosen according to the uniform distribution over and whose remaining bits were generated by the same algorithm that uses with some seed . We denote by the distribution of strings generated in this way. Therefore, is the distribution obtained by sampling the seed space and outputting only , for there are no bits chosen from a uniform distribution in this case. Conversely, denotes the uniform distribution over because no bits are generated by the same algorithm that uses.

We need to show that which can be done by using the Randomness and just showing that for all .

Suppose, towards contradiction, that there is a statistical test and some such that

for some non-negligible .

We now construct an algorithm which will interpret the as an output of at some stage which used a seed which we will call . This output is comprised of a seed for the next stage, i.e. , and one output bit, . Subsequently, generates a string of length . The bits are chosen according to a uniform distribution, the algorithm then copies the bit into and finally it generates the bits by using the same process as does, utilising as the initial seed. At the end, will simply return .

#![allow(unused)]
fn main() {
fn D'(y: str[S+1]) -> bit {
	let z: str[T];
	
	for (let j = 0; j < S; ++j) { // i is the constant for which we assume that H_i is distinguishable from H_(i+1)
		z[j] = random_bit(); // Initialise the first i bits, i.e. z[0],...,z[i-1] to uniformly random values
	}
	z[S] = y[S]; // copy the i-th bit from y
	
	let current_seed = y[0..S]; // Interpret the first i bits of y as the initial seed
	
	for (let j = S + 1; j < T; ++) { // Execute the same algorithm as GenT to generate the remaining bits of z
		let pseudorandom_str = Gen(current_seed);
		z[j] = pseudorandom_str[S];
		current_seed = pseudorandom_str[0..S]
	}
	return D(z); // Return whatever value D will give for the string z;
}
}

If is efficient, then is also clearly efficient, so we need not worry about this anymore. Now, if is chosen according to a uniform distribution, then will feed into the string which would be distributed according to with , since the bits are generated according to a random distribution, the bit is copied from , which was in itself chosen according to a random distribution, and the rest of the bits are generated also generated by . On the other hand, if was the output of for some seed , then will feed into the string which would distributed according to , since the bits are generated according to a random distribution, is copied from , which was generated by , and the rest of the bits are generated by . Under our assumption it follows that

But this contradicts the security of .

Note

One might think that is also a requirement, because otherwise the algorithm will execute more than steps and would thus require more than seeds for all these steps which means that it will start repeating seeds, thus making it predictable. However, the requirement that is polynomial takes care of that - for a given , the constants required to make the polynomial greater than are so ridiculously huge and grow so mind-bogglingly fast that they can be considered infinite. Besides, it is unlikely that you want to produce a googol bits from a 128-bit seed.